Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network

نویسندگان

Amelia J. Gully

Takenori Yoshimura

Damian T. Murphy

Kei Hashimoto

Yoshihiko Nankaku

Keiichi Tokuda

چکیده

Following recent advances in direct modeling of the speech waveform using a deep neural network, we propose a novel method that directly estimates a physical model of the vocal tract from the speech waveform, rather than magnetic resonance imaging data. This provides a clear relationship between the model and the size and shape of the vocal tract, offering considerable flexibility in terms of speech characteristics such as age and gender. Initial tests indicate that despite a highly simplified physical model, intelligible synthesized speech is obtained. This illustrates the potential of the combined technique for the control of physical models in general, and hence the generation of more natural-sounding synthetic speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

Articulatory information has been shown to be effective in improving the performance of hidden Markov model (HMM)based text-to-speech (TTS) synthesis. Recently, deep learningbased TTS has outperformed HMM-based approaches. However, articulatory information has rarely been integrated in deep learning-based TTS. This paper investigated the effectiveness of integrating articulatory movement data t...

متن کامل

Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings

Automatic prediction of articulatory movements from speech or text can be beneficial for many applications such as speech recognition and synthesis. A recent approach has reported stateof-the-art performance in speech-to-articulatory prediction using feed forward neural networks. In this paper, we investigate the feasibility of using bidirectional long short-term memory based recurrent neural n...

متن کامل

Data driven articulatory synthesis with deep neural networks

The conventional approach for data-driven articulatory synthesis consists of modeling the joint acoustic-articulatory distribution with a Gaussian mixture model (GMM), followed by a post-processing step that optimizes the resulting acoustic trajectories. This final step can significantly improve the accuracy of the GMM frame-by-frame mapping but is computationally intensive and requires that th...

متن کامل

Robust articulatory speech synthesis using deep neural networks for BCI applications

Brain-Computer Interfaces (BCIs) usually propose typing strategies to restore communication for paralyzed and aphasic people. A more natural way would be to use speech BCI directly controlling a speech synthesizer. Toward this goal, a prerequisite is the development a synthesizer that should i) produce intelligible speech, ii) run in real time, iii) depend on as few parameters as possible, and ...

متن کامل

Multiview Representation Learning via Deep CCA for Silent Speech Recognition

Silent speech recognition (SSR) converts non-audio information such as articulatory (tongue and lip) movements to text. Articulatory movements generally have less information than acoustic features for speech recognition, and therefore, the performance of SSR may be limited. Multiview representation learning, which can learn better representations by analyzing multiple information sources simul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network

نویسندگان

چکیده

منابع مشابه

Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis

Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings

Data driven articulatory synthesis with deep neural networks

Robust articulatory speech synthesis using deep neural networks for BCI applications

Multiview Representation Learning via Deep CCA for Silent Speech Recognition

عنوان ژورنال:

اشتراک گذاری